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We present a strategy for a statistically rigorous Bayesian 
approach to the problem of determining cosmological param- 
eters from the results of observations of anisotropies in the 
cosmic microwave radiation background. We propose the ap- 
plication of Markov chain Monte Carlo methods, specifically 
the Metropolis-Hastings algorithm, to estimate the parame- 
ters. A complete statistical analysis is presented, with the 
Metropolis-Hastings algorithm described in detail. 
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I. INTRODUCTION 

Recent observations of the angular power spectrum 
of the anisotropy of the cosmic microwave background 
(CMB) have created much excitement. This data 

may be used to estimate cosmological parameters. The 
maximum in the angular power spectrum around the 
multipole value I w 200 is consistent with inflation and a 
flat universe However the apparent absence of a 

peak at I « 400 — 500 [Q| may imply that we have under- 
estimated the size of such cosmological parameters as the 
amount of normal baryonic matter or cold dark matter. 
This data, coupled with recent supernovae observations 
| pj| , seems to be leading us to seriously consider the ex- 
istence of a cosmological constant. Future observations 
will provide valuable cosmological information [^jl3| . 

The extraction of cosmological parameters from the 
CMB anisotropy data is a complicated and computer in- 
tensive activity. This is especially true as the number of 
cosmological parameters increases as the modal complex- 
ities grow. Current analysis exercises have included up to 
ten parameters; the scaler quadrupole and gravity wave 
perturbation normalizations {As and At), the scaler and 
tensor power-law indices for primordial perturbations [ug 
and Tit), the reionization optical depth r, the spatial cur- 
vature fifc, the energy densities for baryonic matter (Qt), 
cold dark matter (Qcdm), neutrinos (Q^) and the vacuum 
{^Ia) [0- Other logical cosmological parameters could 
include the Hubble constant, number of neutrino families 
and their masses, etc. 

Parameter estimation can be comprehensively de- 
scribed within the language of Bayesian statistics. Appli- 
cation of Bayes' theorem is well suited to astrophysical 
observations |15|. In Bayesian data analysis the model 
consists of a joint distribution over all unobserved (pa- 



rameters) and observed (data) quantities. One condi- 
tions on the data to obtain the posterior distribution of 
the parameters. The starting point of the Bayesian ap- 
proach to statistical inference is setting up a full proba- 
bility model that consists of the joint probability distri- 
bution of all observables, denoted by z = (zi , . . . , 2„) and 
unobservable quantities, denoted hy 9 — {9i, . . . ,6d). Us- 
ing the notion of conditional probability, this joint PDF 
/(z, 9) can be decomposed into the product of the PDF 
of all unobservables, f{9), referred to as the prior PDF 
of 9, and the conditional PDF of the observables given 
the unobservables, f{z\9), referred to as the sampling 
distribution or likelihood, i.e. 

/(z,0) = /(0)/(z|0). 

The prior PDF contains all the information about the 
unobservables that is known from substantive knowledge 
and expert opinion before observing the data. All the 
information about the 9 that stems from the experiment 
is contained in the likelihood. In the light of the data, 
the Bayesian paradigm then updates the prior knowledge 
about 9, f{9), to the posterior PDF of 0, /(0|z). This is 
done via an application of Bayes theorem through condi- 
tioning on the observations 



/(e|z) 



/(e,z) 
m(z) 



« /(e)/(z|0) 



where m(z) = / f{z\9)f{9)d9 is the marginal PDF of z 
which can be regarded as a normalizing constant as it is 
independent of 9. The Bayesian approach is based on the 
likelihood function but treats the parameters as random 
variables and assumes a joint prior distribution that sum- 
marizes the available information about the parameters 
before observing the data. In the light of the observa- 
tions, the information about the unknown parameters is 
then updated via Bayes theorem to the posterior distri- 
bution which is proportional to the product of likelihood 
and prior density ]l6[ |. 

The problem of marginalization of parameters has been 
the Achilles heel of Bayesian approaches to parameter 
estimation. The inevitable difflculty in calculating the 
multidimensional integrals necessary for the determina- 
tion of posterior probability distributions has hindered 
efforts. However, these impediments have been overcome 
by the progress made within the last decade in Bayesian 
computational technology via Markov chain Monte Carlo 
(MCMC) methods Since its initial application in 
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digital signal analysis [Q MCMC methods have revolu- 
tionized many areas of applied statistics and should have 
an impact for cosmological parameter estimation from 
CMB measurements. 

In this paper we do NOT implement a MCMC method 
for estimating cosmological parameters. This task is best 
left to the active scientists in the field. Instead we de- 
scribe how researchers may adapt their existing special- 
ized software in order to produce probability distribu- 
tions functions for the parameters that have been derived 
in a statistically rigorous manner. 

In section II we review methods that have been used 
to estimate cosmological parameters from CMB measure- 
ments, plus techniques for calculating likelihoods. In sec- 
tion III we describe the Bayesian approach to statistical 
inference and its implementation via Markov chain Monte 
Carlo methods. In section IV we describe a methods for 
applying MCMC methods to cosmological parameter es- 
timation with CMB data. Section V presents our con- 
clusions. 



II. CURRENT STATISTICAL METHODS WITH 
CMB DATA 

A. Parameter Estimation 

Various statistical approaches have attempted to esti- 
mate cosmological parameters from the CMB data. The 
maximum likelihood is an example ]T9|-^l[|. Likelihood 
techniques aim at finding the values of the parameters 
that maximize the likelihood function. However, the 
maximum of the likelihood is not necessarily the peak 
value of the parameter's marginal distribution (its PDF) 
or its mean. The maximum likelihood parameter esti- 
mates will differ from the Bayes estimators. The max- 
imum likelihood is not always good for parameter es- 
timation; for a given parameter, the maximization de- 
pends on whether or not other parameters have been in- 
tegrated out. A proper application of Bayes' theorem 
is attempted in some studies through the inclusion of 
prior distributions The typical approach is 

to select a range for the parameters and limit the max- 
imum likelihood technique to that region in parameter 
space. Sophisticated frequentist techniques are imple- 
mented in order to maximize the likelihood, such as the 
Levenberg-Marquardt method p^ , p^ . or the downhill sim- 
plex method |2g,||l. 

In the methods described above the problem of magi- 
nalizing over parameters is non-trivial. Typically the ap- 
proach is to marginalize over all parameters except the 
vacuum energy density (Qa) and the matter energy den- 
sity {Qm = ^b + ^cd7n)- This proccdurc gets prohibitively 
difficult as the number of parameters increases; in some 
cases the amount of computation can be expected to rise 
exponentially with parameter number. Hence, the tech- 
niques that have already been used will be difficult to 



apply to cosmological models with increasing complexity. 
Marginalizing over parameters will also become difficult if 
one no longer assumes that the variables are log-normally 
distributed, or if the cosmological CMB anisotropy signal 
is actually non-Gaussian p5| , p6| . 

Numerical maximization techniques used in maximum 
likelihood estimation, such as the Levenberg-Marquardt 
method, are only guaranteed to find a local maximum. 
Once they reached a local maximum they might get stuck 
in their search and not reach the global maximum. For 
this reason statisticians have applied simulating anneal- 
ing, which is a technique for global optimization. Simu- 
lated annealing has been attempted for cosmological pa- 
rameter estimation | |27| , p8t . Simulated annealing is re- 
lated to MCMC as its core component is the Metropolis 
Hastings algorithm. In this method the parameter space 
is searched in a random way. A new parameter space 
point is reached with a probability that depends on the 
likelihood and an effective temperature term. In the limit 
where the temperature approaches zero the thermody- 
namics of this parameter space search finds the system 
approaching the maximum of the likelihood. Stochas- 
tic methods such as these substitute deterministic inte- 
gration by a statistical estimation problem. Although 
these methods are applicable to high-dimensional prob- 
lems, they can be very inefficient in certain situations. 
The shortcomings of simulating annealing are that there 
is no guarantee that it will actually find the global max- 
imum in finite time. The efficiency depends very much 
on specifying a good cooling schedule which involves the 
arbitrary and skillful choice of various cooling parame- 
ters that specify the algorithm and determine its perfor- 
mance. 



B. Calculating the Likelihood 

Implicit in the problem of parameter estimation is the 
calculation of the likelihood, namely the probability of 
obtaining the data given a particular model (and its as- 
sociated parameters). The sizes of the data sets from 
CMB experiments are immense, so it is impractical to 
use all the raw data for producing the likelihood. In- 
stead, radical compression of the data is imperative [ p3[ . 
Even so, a precise derivation of the likelihood is very dif- 
ficult Approximate techniques have been 
developed that take as input the confidence intervals of 
the experimental results p3| , p9| -p2|. Software for calcu- 
lating the likelihoods, RAD PACK is freely available at 
http:/ /flight. uchicago. edu/knox/ radpack. htmi [^. 



In order to calculate the likelihood it is necessary to 
have the angular power spectrum of this CMB anisotropy 
for a specific model (with its associated cosmological pa- 
rameters). This calculational task is accomplished by 
code such as CMBfast Q or CAMB Q. This code ac- 
cepts the cosmological parameters as input, and returns 
the angular power spectrum of the CMB anisotropics, C/. 
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These software packages serve as the work-horses of cur- 
rent CMB statistical work. For example, the likelihood 
has been calculated 30,311,820 times in order to cover a 
region in a ten-dimensional cosmological parameter space 
WM. In other studies the likelihood was evaluated as 
needed within the calculation [^l|j2C| ]. 

With a large number of parameters it becomes imprac- 
tical to marginalize by integration. So when attempts are 
made to produce constraints in the JIa — flm plane ap- 
proximate marginalization techniques must be applied. 
One can estimate the parameters by accepting the val- 
ues from the maximum likelihood on the parameter grid 
p5|,Q, and then interpolate between grid points with a 
14 1 . Another technique is to maximize the 

parameters for a 



cubic spline 



likelihood with respect to all remaining 
fixed point in the 17 a — flm plane pl| , pO|] . These methods 
are further complicated due to degeneracies among the 
cosmological parameters, [p^ 



III. BAYESIAN POSTERIOR COMPUTATION 
VIA MCMC 

The difficulty with the Bayesian approach to parame- 
ter estimation is high-dimensional integration. To calcu- 
late the normalizing constant of the joint posterior PDF, 
for instance, requires d-dimensional integration. Having 
obtained the joint posterior PDF of 9, the posterior PDF 
of a single parameter 6i of interest can be obtained by 
integrating out all the other components, i.e. 



f{0,\z) = J ...J f{9\z)d9l...d9,-ld9,- 



.d9d. 



Calculation of the posterior mean of 9i necessitates a fur- 
ther integration, e.g. E[9i\z] — J 9if{9i\z)d9i. 

As the joint posterior is too complex to sample from 
directly, we propose to use a MCMC method ||l7| , ^ . In- 
stead of generating a sequence of independent samples 
from the joint posterior, in MCMC a Markov chain is 
constructed, whose equilibrium distribution is just the 
joint posterior. Thus, after running the Markov chain 
for a certain "burn- in" period, one obtains (correlated) 
samples from the limiting distribution, provided that the 
Markov chain has reached convergence. 

One method for generating a Markov chain is via the 
Metropolis-Hastings algorithm. The MH algorithm was 
developed by Metropolis et al. |^ and generalized by 
Hastings WOf. It is a MCMC method which means that it 
generates a Markov chain whose equilibrium distribution 
is just the target density, here the joint posterior PDF 
f{6\z). For ease of notation, let us assume that the tar- 
get density is some f{x), where x can be d-dimensional. 
Then a Markov chain is specified by constructing a tran- 
sition kernel P{x,A), giving the conditional probability 
to move from state cc to a point in A B, the Borel a- 
field on M"^. As the probability to stay in the current 
state X, P{x,{x}), is not necessarily 0, we assume that 
the transition kernel can be expressed as 



P{x, dy) = p{x, y)dy + r{x)Sx{dy) 

where p{x, x) — and Sx{dy) = 1 if x G dy, otherwise, 
and r(a;) — 1 — J p{x,y)dy. It is well known that if 
satisfies ^reversibility^ or 'detailed balance\ i.e. 

f{x)p{x,y) = f{y)p{y,x) 

then f{x) is the invariant density of the Markov chain. 
Together with irreducibility and aperiodicity, this gives 
a sufficient condition for the convergence of the Markov 
chain to its stationary distribution, see Tierney for 
further details. 

The MH algorithm shares the concept of a generat- 
ing density with the well-known simulation technique of 
rejection sampling. However, the candidate generating 
density q{y\x), J q{y\x)dy = 1, can now depend on the 
current state x of the sampling process, and instead of 
rigorously accepting or rejecting a new candidate y, it is 
accepted with a certain acceptance probability a{y\x) also 
depending on the current state x, and chosen such that 
the transition probability y) — q{y\x)a{'y\x) satisfies 
detailed balance. This is met by setting 



a[y\x)) = mm <^ ,,,,, , 1 
[f{x)q{y\x) 



if f{x)q{y\x)) > and a{y\x) = 1 otherwise. Note that 
irreducibility is guaranteed if q{y\x) is positive on the 
support of /. 

The steps of the MH algorithm are therefore: 

Step 0: Start with an arbitrary value xq 

Step i + I: Generate y from q{.\xi) and u from U{0, 1) 
If u < a{y\xi) set Xi+i = y (acceptance) 
If u > a{y\xi) set Xi+i — Xi (rejection) 

The MH algorithm does not require the normalization 
constant of the target density. The outcomes from the 
MH algorithm can be regarded as a sample from the in- 
variant density only after a certain 'burn-in' period. For 
issues concerning convergence diagnostics, the reader is 
referred to Cowles and Carlin |^^. Note that an impor- 
tant special case of the MH algorithm is the ^indepen- 
dence chain' where q{y\x) = q{y), i.e. a new candidate is 
generated independently of the current state x. 

Various methods to assess convergence, i.e. methods 
used for establishing whether an MCMC algorithm has 
converged and whether its output can be regarded as 
samples from the target distribution of the Markov chain, 
have been developed and implemented in CODA p3[ . 
CODA is a menu-driven collection of SPLUS functions 
for analyzing the output of the Markov chain. Besides 
trace plots and the usual tests for convergence, CODA 
calculates statistical summaries of the posterior distribu- 
tions and kernel density estimates. 
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MCMC techniques have been apphed in numerous ar- 
eas, from science to economics. Apphcations of state- 
space modehng in finance, e.g. stochastic volatihty mod- 
els apphed to time series of daily exchange rates or re- 
turns of stock exchange indices, easily have 1000-5000 pa- 
rameters and the Gibbs sampler shows slow convergence 
due to high posterior correlations [p|-|4^. Specially tai- 
lored MCMC algorithms, like multi-move Gibbs sam- 
plers or Metropolis-Hastings algorithms, can markedly 
improve the speed of convergence . 



IV. APPLYING MCMC METHODS TO CMB 
ANISOTROPY DATA 

It is possible for researchers to generate probability 
density functions for cosmological parameters from the 
CMB anisotropy data. This can be done in a Bayesian 
fashion with the MCMC serving as a means of conduct- 
ing a proper marginalization over parameters. The im- 
plementation of the MCMC method would be relatively 
straight forward. Instead of calculating the likelihood at 
uniform locations in the parameter space , one would 
let the MCMC do its intelligent walk through the space. 
Uniform a priori distributions for the parameters seems 
reasonable, so the MCMC would sample the parameter 
space defined by them. Since the likelihood function can 
not be written explicitly in terms of the cosmological pa- 
rameters, but instead in terms of the CMB anisotropy 
power spectra terms C/, it will be necessary to imple- 
ment a Metropolis-Hastings MCMC routine. 

The Markov chain would commence at a randomly se- 
lected position in parameter space {9^'' , . . . , O^j^^ ) . With 
the parameter set one would then utilize CMBfast ||3^] 
or CAMB to generate a set of CMB anisotropy 
angular power spectrum components (C;),.and a likeli- 
hood would then be calculated with CMB anisotropy 
data 1^,^,^,111 . New values for the parameters 

{o[^\ . . . , 0^^^) would be selected via sampling from the 
a priori distributions. However, these values would 
not necessarily be accepted as new values. First they 
would be used as input to CMBfast Q or CAMB 
[Q to generate a set of CMB anisotropy angular 
power spectrum components (Cj), and a likelihood 
would then be calculated with CMB anisotropy data 
p3| , p9 31 3^. The new values would be accepted or re- 
jected according the to following test; a random num- 
ber, u, would be generated between and 1. If 



u < min 

(1) 



(1) 



i'^)/f{z\e^^\...,0^P) (where 



f{z\6l'', . . . , 6'^'') is the likelihood in terms of data z and 
cosmological parameters . . . , 9d)) then the new pa- 
rameters is accepted into the chain, if not the next chain 
element has values equal to that of the previous state. A 
new set of parameters would then be randomly sampled 
from the a priori distributions and the procedure would 
continue. 



The generated chain of parameter values would form 
the set from which the statistical properties would be 
derived. After running the Markov chain for a certain 
" burn- in" period (in order for the Markov chain to reach 
convergence) one obtains (correlated) samples from the 
limiting distribution. This process would continue for 
a sufficiently long time (as determined by convergence 
diagnostics [^ ). 

After the burn-in the frequency of appearance of pa- 
rameters would represent the actual posterior density of 
the parameter. From the posterior density one can then 
create confidence intervals. Summary statistics are pro- 
duced from the distribution, such as posterior mean and 
standard deviation. A cross-correlation matrix is also 
easily produced; this would be of great importance for 
quantifying the "degeneracy" of flm and flm [0- A 
distinct advantage of the MCMC approach is that com- 
putational time scales linearly with parameter number. 
Hence, the MCMC approach to cosmological parame- 
ter estimation may provide the best strategy when test- 
ing complex models with numerous parameters. Also, 
MCMC methods could work with the exact form of the 
likelihood, although the approximate forms may prove to 



be sufficiently adequate 

The above method would constitute the simplest im- 
plementation of the Metropolis-Hastings method, that of 
an independence chain. We have just used an acceptance 
probability a{6*\9), defined by 



a{6*\6) = min 



f{z\e*)q{e\9*) 
f{z\e)q{e*\e) 



,1 



where the generating density q{6*\6) is the uniform den- 
sity over the parameter space and thus, in particular, 
independent of the current state. By using uniform pri- 
ors, the posterior PDFs in the acceptance probability 
calculation reduce to the likelihoods. However, the ef- 
ficiency of a Metropolis-Hastings algorithm depends cru- 
cially on the form of the generating density q{6*\6). Just 
using a uniform distribution that does not even depend 
on the current state 9 is the most simple but proba- 
bly most inefficient way to accomplish the task. Even 
with a uniform distribution, the algorithm will be irre- 
ducible/aperiodic/reversible and thus the Markov chain 
will converge towards its stationary distribution. 

A slightly better way might be to use a uniform dis- 
tribution in a neighborhood of the current 6. Any 
prior information could be useful, such as correlations 
that one could use to specify a multivariate normal cen- 
tered around the current with a covariance matrix 
that takes said correlations into account. The opti- 
mization of the Metropolis-Hastings MCMC strategy will 
inevitably require experimentation with the generating 
density q{6*\6). While this may require some detailed 
study, the benefit will be the ability to generate pos- 
terior distributions for a large number of cosmological 
parameters. 
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V. DISCUSSION 

A proper Bayesian approach to parameter estimation 
allows one to simultaneously estimate both dynamic sig- 
nals and measurement noise. MCMC methods have 
demonstrated their importance in Bayesian parameter 
estimation problems with large numbers of parameters. 
As cosmological models grow in complexity it will be- 
come necessary to use techniques such as those discussed 
here in order to handle marginalization of parameters. 
Furthermore, a MCMC approach is not restricted to the 
assumption of Gaussian noise. A heavy-tailed observa- 
tion error distribution such as a Student-i-distribution 
with large degrees of freedom might be more appropriate 
to allow for crude measurement errors and ensure that 
resulting estimates will be robust against additive out- 
liers. Non-Gaussian error distributions are readily incor- 
porated into MCMC calculations. We therefore strongly 
advocate the Bayesian approach via Metropolis-Hastings 
sampling in future analyses of cosmological parameter 
estimation from CMB data. 
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